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1^ Abstract 

A generalization of recent group-theoretic matrix multiplication al- 
gorithms to an analogue of the theory of partial matrix multiplication is 
presented. Wc demonstrate that the added flexibility of this approach can 
in some cases improve upper bounds on the exponent of matrix multipli- 
cation yielded by group-theoretic full matrix multiplication. The group 
theory behind our partial matrix multiplication algorithms leads to the 
problem of maximizing a quantity representing the "fullness" of a given 
C/3 partial matrix pattern. This problem is shown to be NP-hard, and two 

O algorithms, one optimal and another non-optimal but polynomial-time, 

are given for solving it. 
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1 Introduction 



In 1969, Volker Strassen showed that the naive algorithm for square matrix 
multiplication, which takes O(n^) time to multiply matrices of dimension n, is 
not optimal [8/, the algorithm he presented multiplied matrices in 0{n^°^^'^) ~ 
Q(^^2.8a7y Together with the simple lower bound of 0{n^) on the number of 
multiplications needed to multiply n x n matrices, Strassen's result originated 
the problem of determining the "best possible" exponent of matrix multiplica- 
tion. To be precise, if M(n) is the number of field operations in characteristic 
required to multiply two n x n matrices, Strassen made the first step towards 
determining 

uj = inf{r e R\M{n) ^ 0(n'^)}, 

the exponent of matrix multiplication. 

Gradual improvements were made to the upper bound on lu. In 1990, Cop- 
persmith and Winograd [4j showed that cu < 2.38, a bound which remains the 
world record. A promising group-theoretic approach was presented by Cohn 
and Umans in 2003 . They described an embedding of matrices into a group 
algebra that would allow for fast convolution via a Fourier transform, in much 
the same way that polynomials can be multiplied efficiently by embedding them 
in a group algebra, applying an FFT and then performing the convolution in 
the frequency domain. The challenge was to find an appropriate group together 
with three subsets which serve to index the matrix entries in the embedding. 
Using this method, Cohn et al. [5] tied the record of a; < 2.38. 

Proving a tight upper bound on w is a long-standing open problem in theo- 
retical computer science. It is widely believed that uj = 2, but no progress has 
been made on the best known upper bound in nearly two decades. 

In this paper, we generalize the results of Cohn et al., which only deal with 
full matrix multiplication, to a theory of group-theoretic partial matrix multipli- 



cation and use this approach to prove bounds on uj. In particular. Theorem 2.12 
states that 



UJ < 



log /(A) 



where the di are the character degrees of the chosen group and f{A) represents, 
roughly, the amount of information computed in the product of two partial 
matrices of a particular "pattern." 

The group-theory behind our partial matrix multiplication algorithm leads 
to an additional computational challenge, namely optimizing the quantity f{A) 
given a set of possible patterns. We show this problem to be NP-hard, and de- 
scribe a non-optimal but polynomial-time algorithm, as well as an exponential- 
time algorithm for solving it. In a particular case, we show how to improve 
an upper bound on uj obtained in [2] by using the greater generality of group- 
theoretic partial matrix multiplication. 
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2 Full and Partial Group-Theoretic Matrix Mul- 
tiplication 

Our main theorems describe an algorithm for multiplying matrices using triples 



of subsets not satisfying the triple product property (see Definition 2.4 1. Some 
entries must be set to zero, and then partial matrix multiplications are per- 
formed. This section introduces the original group-theoretic algorithm by Cohn 
and Umans f^, as well as the notion of 'aliasing', the motivation for our focus 
on partial matrix multiplication. 

2.1 Full Multiplication: The Cohn-Umans Algorithm 

Definition 2.1. If S,T,U are ordered subsets of a group G, then the Cohn- 
Umans algorithm 3 for matrix multiplication computes the product of matrices 
M and N of dimensions |5| x |r| and |T| x \U\, respectively, as follows. 

Index the rows of M by S'"\ the columns of M by T, the rows of N by 
T~^, and the columns of N by U. Then let /m — J2i j ^'^iJ^T^^j = 

I- Nj i~t~^Uk- Compute fp — fufN, and assign to P^.^ the coefficient of 
s~^Uk in fp. 

Theorem 2.2. The Cohn-Umans algorithm computes, in position i,k of the 
product matrix, the sum of all terms Mii _jNji ^k' , where 

Proof. Every term in fp is a product of a term in /m with a term in /^r. The 
s~^Uk term is exactly the sum of all terms {zm){z'n), where z,z' G C"^", 
m G S^^T and n G T^^U, and mn — s^^Uk- But this is exactly the sum in the 
statement of the theorem. ■ 

Corollary 2.3. The Cohn-Umans algorithm is correct if and only if for all 
s,s' G S,t,t' G T,u,u' G U, we have that ss'~^tt'~^uu'^^ = e implies s = 
s', t — t' ,u — u' . 

Proof. This result follows from the previous theorem since 

implies i = i',j = j' ,u — u' , meaning that entry of the product only 

contains terms formed by multiplying entry (i, j) by [j, k) in the left and right 
factor matrices, respectively. ■ 



Definition 2.4. The property in 2.3 is called the triple product property [3]. 

Example 2.5. The following sets in D12 — {x,y\x^ — — 1, xy — yx^^) have 
the triple-product property: 

S = {l,y} 

T = {l,yx'^,x'^,xy} 

U^{\,yx} 
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Thus, S, T, and U can be used to index the product of a full 2x4 matrix by a 
ful 4x2 matrix, with no errors. 

In this way, Cohn and Umans reduced the problem of proving bounds on w 
to that of searching for groups with a good combination of character degrees 
and subsets satisfying the triple product property. It is, however, unnecessary 
to require that the group element index sets produce a fully correct product. 
Even when terms in the group algebra multiplication incorrectly appear in an 
entry of the product matrix due to a violation of the triple product property by 
our chosen subsets S, T, and U (we call this phenomenon aliasing to emphasize 
the analogy to the usual Fourier transform in signal processing), these index 
sets will still compute the correct product in the case where one of the input 
entries contributing to each aliasing term contains a zero. 

In the next section, we show how to apply the classical theory of partial ma- 
trix multiplication to the group-theoretic framework developed by Cohn et al. 
We will present bounds on oj realizable through subsets which may or may not 
satisfy the triple product property; in a special case, we can show that our algo- 
rithm yields strictly stronger results than the original Cohn-Umans full matrix 
multiplication algorithm. For a specific family of constructions satisfying the 
triple product property, the associated bound on oj can be improved by adding 
a single element to each of the sets, described in Section |4] This means that the 
additional information computed by increasing the matrix dimensions outwieghs 
the information lost due to the partial nature of the larger multiplication. 

2.2 Partial Multiplication: Aliasing 

Definition 2.6. If S,T,U are subsets of a group G, the set of all triples 
i{t,j),{f,k),it\k')) where 

and i =/= ^ j' , or k ^ k' is called the set of aliasing triples, A. 

Aliasing sets can be visualized as sets of lines representing the triples as 
shown in Figure [T] Each line is broken up into two pieces: the first runs from 
the left factor matrix to the right factor matrix and represents which pair of 
input entries combine to produce an incorrect term in the product; the second 
runs from the right factor matrix to the product, indicating where the incorrect 
term appears. 

Definition 2.7. The left aliasing set of a set of aliasing triples A is 

{x : there exist y, z such that (z, y, z) £ A} . 

The right aliasing set and the product aliasing set are defined analagously. The 
left aliasing set is the set of indices in the left factor matrix in Figure [T] that are 
the endpoints of one of the lines. 
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Figure 1: A visualization of the aliasing set in Example 2.10 where the input 
matrices are the left and middle rectangles, and the output is on the right. A 
triple ((«, j), (j'jfc), (i',k')) corresponds to a pair of lines from in the left 
factor matrix to (j', k) in the right, and from (j', k) in the right factor matrix 
to (i', k') in the product; the set of all which is the start of a line in the 
diagram is the left aliasing set. 



It is impossible to have only one of i ^ j',k ^ k' (if, for example, 

only i ^ i' held, then we would have s~^euk — s~,^Uk). Thus, an incorrect term 
in the Cohn-Umans algorithm will only occur having at least two of 

1. being in the wrong row given its first multiplicand, 

2. being in the wrong column given its second multiplicand, or 

3. having its multiplicands coming from different positions in their respective 
row and column. 

Definition 2.8. Let A be a set of aliasing triples for S,T,U C G. We say that 
/ and J cover A if I and J are subsets of the indices of entries of a \S\ x \T\ 
and |r| X \U\ matrix, respectively, such that for all a in A, either the first entry 
of a is in / or the second is in J. If M and N are |5| x |T| and |r| x \U\ entries 
such that for every index i in /, Mj is 0, and similarly for N and J, we say that 
M and realize I and J. 

Theorem 2.9. Let G be a group and let S, T, U be indexing sets with aliasing set 
A. Let M, N be matrices of size \S\ x |T|, |T| x \U\, respectively, and let /, J be 
subsets of the indices that cover A. LfM,N realize I, J, then the Cohn-Umans 
algorithm correctly computes the partial matrix product MN . 



Proof. By Theorem 2.2 the extra terms arise from entries in the input matrices 
with indices in the aliasing set A. Thus setting the entries corresponding to 
entries of / and J to zero sets the coefficient on each incorrect term to zero, 
yielding the correct product of the partial matrices of M, N. m 
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Example 2.10. Consider our earlier example in D12, with a change to the last 
element of T: 



S^{l,y} 



[/ = {1,2/4. 

This triple has aliasing set 

^ = {((2, 4), (3,2), (1,1)), 
((2,4), (3,1), (1,2)), 
((1,4), (3,2), (2,1)), 
((1,4),(3,1),(2,2))}, 

as depicted in Figure [T] The first element of A describes the indices in the 
product 



'2 '•4''3 "2 — ''1 "1 

that erroneously form an extra term in the top left corner of the product matrix. 
Thus, using these sets, the Cohn-Umans algorithm correctly computes these 
types of partial matrix multiplication: 



024 



ai,2 
a2,2 



ai,3 
a2.3 



ai,i 
02,1 



ai,2 
02,2 



ai,3 
02.3 



ai,4 
02,4 



61,1 


bi,2 


62,1 


b2,2 


63,1 


63,2 


64,1 


64,2_ 


"61,1 


61,2 


62,1 


62.2 








h.i 


64,2 



The aliasing triples are visually depicted in Figure [T] 

We will now introduce a function that will be an integral part of our partial 
matrix multiplication algorithm. It computes the number of ones in a tensor of 
partial matrix multiplication, which intuitively means the amount of information 
computed by this partial multiplication. Its importance will become clear in the 
next theorem. 

Definition 2.11. Let A be a set of aliasing triples and let / and J cover A. 
The function /(/, J) is equal to 



where ki is the number of entires in the i column of the left factor matrix 
which do not appear in / and rii is the number of entries in the i*'' row of the 
right factor matrix which do not appear in J. Finally, f{A) is 



f{A) = max{/(/, J)\I and J cover A}. 
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The function / is a measure of how much computation is being done by a 
partial matrix multiphcation. Notice that if there is no zeroing in a multiplica- 
tion of m by n by n by p, then by / and J both empty, / > mnp (and it's easy 
to see that / = mnp). The following theorem is used to derive many of our 
results; it provides a bound on uj given subsets which need not satisfy the triple 
product property. For this proof, it is sufficient to consider only matrices of 
complex numbers. Note that in the special case where the aliasing set is empty 
(that is, S,T,U have the triple product property), f{A) = \S\\T\\U\ and our 
bound recovers Theorem 1.8 in [2]. This mimics the proof of Theorem 4.1 in [3], 
and uses some if its terminology. 

Theorem 2.12. Let S,T,U C G with aliasing triples A, and suppose G has 
character degrees {di}. Then 

^ 31og(E.<) 

- log /(A) 

Proof. Let t be the tensor of partial matrix multiplication corresponding to /, J, 
the patterns which maximize /. It is clear that 

<<CG^0(d„d„d,) 

i 

(similar to Theorem 2.3 in [3 ). Then the l*^ tensor power of t satisfies 

t ^ {^ii • ■ • ^ii 1 • • • ^ii ; • • ■ ^ii ) ■ 

By the definition of w, each {di-^^ . . .dii^di^ ... di, , . . . di,) has rank at most 
C(dij . . . fii,)""*"^ for some C and for all e. So, taking the rank of both sides 
gives 

R{ty <G{^d-+^)\ 

from Proposition 15.1 in [T]. Since this is true of all e > 0, it holds for e = by 
continuity: 

R{ty<G{^d-)\ 
Taking l*-^ roots ds I ^ oo gives 

R{t)<Y.dt 

i 

By Theorem 4.1 in [7] 

3iog(E.rfr) 

- log /(A) • 
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3 Algorithms for Aliasing Structure 



In the study of aliasing, the following problem conies up: there is a pattern A, 
and one wishes to find the value f{A) by trying various /, J. This problem is 
NP-hard; this section describes the worst-case exponential-time algorithm we 
use to solve it exactly, as well as a polynomial time algorithm used to find a 
reasonable solution. 

3.1 A Polynomial-Time Non-Optimal Algorithm for Find- 
ing Aliasing Covers 

In this section we will give a polynomial-time algorithm for finding covering 
sets /, J. This is not an approximation algorithm in the complexity-theoretic 
sense; it is merely a "pretty good" algorithm which we found useful in research. 
Instead of finding the cover which minimizes /, we find the cover which zeros 
the fewest entries. Viewing the entries in the factor matrices as vertices in a 
bipartite graph, and the pairs in the aliasing set as edges, it is clear that we 
desire a minimal vertex cover. By Konig's theorem, this is equivalent to finding 
a maximum matching (for an excellent explanation of the associated algorithm, 
see [5]), which can be solved efficiently in bipartite graphs with 

3.2 Computing the Optimal Cover for Aliasing 

When computing / by exhaustive search, one must choose, for each aliasing 
triple, whether to satisfy it by zeroing the left or by zeroing the right. After 
each choice, however, one can compute the current value of / as if the only 
triples in A were those already assigned a zero. Then making further choices 
will only lower this value of /, so if the computed value is below the already 
known best value, the entire search tree can be pruned. In pseudocode, 

procedure maximum_f (A) 
S = new Stack 

F = new Frame (A) #meaning that F stores A, the set of aliasing 
triples; and I and J, the trial patterns, currently empty 

bestf = -1 

bestfFrame = F 

while S is not empty 
f ramie = S.popO 

if every triple in A is covered by frame. I and frame. J eoid 
f (f rame . I ,f rsune . J) > bestf then 
bestf = f (f rcune . I , f rame . J) 
bestfFrame = F 
continue 

if f (frame . I , frame . J) <= bestf then continue #don't need this subtree 
a = first triple in A not covered by frame. I, frame. J 
fremiel = copyCfrcune) 
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fraine2 = copyCfrcune) 
f ramel . I . appendClef t entry of a) 
f r£mie2 . J . appendCright entry of a) 
S .push(f ramel ,f rcune2) 

4 Improving a Group-Theoretic Construction through 
AUasing 

In this section we present an improvement over a construction presented in §2 
of 0. 

4.1 The Original Construction 

Let 

H = Cn X Cn X Cm 

and let Hi < H he the subgroup isomorphic to C„ in the ith coordinate. By z 
we mean the generator of 5*2, and by en we mean the identity element of H. 
We write elements of G as 

(a, b)z^ 

where a,b E H and j = or 1. 

Define, for i G 1,2,3, the subsets of G 

Si = {(a, b)z^a £ H, \ en-be ff^+i, j = or 1} 

where subscripts are taken mod 3. Finally, we let 

S^Si,T^S2,U^S3. 

By [2], Lemma 2.1, S,T,U have the triple product property. Note that 

|5| = |T| = |C/| = 2n(n-l), 

and so 

f = 8n^n-lf. 
This construction gives oj < 2.9088 for n — 17. 

4.2 Relaxing the Triple Product Property 

Let Si be as defined in the previous section, and let 

Sl^S,U{{eH,eH)}. 

Let S' — S'i,T' — S2,U' — S'^, and let A be the associated aliasing set, shown 
graphically in Figure [2] 

We find that A can be partitioned into three easily analyzed categories: 
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(a) "Bottom aliasing" occurs in the rows of the product which are not indexed 
by the identity. All aliasing of this type can be covered by zeroing some 
(n — 1)^ entries in the (ch, gh) column of R. 

(b) "Top-Easy aliasing" occurs in the {eH,eH) row of the product. These are 
entirely covered by zeroing {n — 1)^ entries of the {ch, gh) column of L. 

(c) "Top-Hard aliasing" also occurs in the {en, en) row of the product. The 
distinction is in the manner in which they arise. Alasing in this category can 
be covered by two things: the same entries which cover Top-Easy aliasing, 
combined with an additional 2n(n — 1) entries in the (ch, ch) column of R. 

This decomposition is depicted in Figure [3] 

There exists a pair /, J with {n — 1)^ elements in the first column in L, and 
the entire first column in R, that cover A. Thus 

/ > {2n{n - l)f + (2n(n - 1))^ -|- (2n(n - 1)) [2n{n - 1) - (n - 1)^ + l] , 

which is strictly greater than / for 5, T, U . For n — 17, we acheive uj < 2.9084. 

The insight here is that we only zeroed entries that we added. That is, this 
partial matrix multiplication contains the entire matrix multiplication indexed 
by S,T,U, and then some more. Thus, by relaxing the restriction on S,T,U, 
we strictly increased the amount of computation done, without increasing the 
work necessary (since G is constant). 

5 The Complexity of Computing a Best Cover 

Often we are confronted with this problem: given some triple of subsets, find 
the best way to put aliasing in the factor matrices and have the best bound on 
CO, i.e., the best /(/, J). We show this problem is computationally hard. 

Consider the problem PARTIAL-TENSOR-ONES: given the dimensions of 
two matrices m,n,p, a set of pairs A = {((ai,6i), {ci,di))}, and an integer k, 
are there / and J realizing A such that /(/, J) = fc? (This is the problem 




Figure 2: A visualization of the aliasing in the construction introduced in Sec- 
tion 4.2 In this case, n = 2. 
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(a) Bottom Aliasing 




Figure 3: A visualization of the three types of ahasing in the construction given 
in Section l42l 



of maximizing the dot product when all the aliasing is to be taken care of 
by the left and right matrices). We show that PARTIAL-TENSOR-ONES is 
NP-complete via a reduction from INDEPENDENT-SET, a well-known NP- 
complete problem. 

Theorem 5.1. PARTIAL-TENSOR-ONES is m-complete 

Proof. An instance of INDEPENDENT-SET consists of (some encoding of) a 
graph and an integer k. Let G — (V, E) be this graph. We will generate an 
instance of PARTIAL-TENSOR-ONES. Let to = p = |y| and n = 1. For each 
edge {vi,Vj), add constraints of the form ((1, i), (j, 1)) and {{j, 1), (1, «)). 

Suppose there is an independent set of size k. Then there is an /, J such that 
/(/, J) = k. For each Vi in the independent set, allow (i, 1) and (l,i) to be free 
and all other entries in the two vectors to be zeroed. It's clear that /(/, J) = k, 
and every constraint is fulfilled because the constraints correspond exactly to 
the edges, so no two free variables appear in the same constraint. 

From an aliasing pattern with /(/, J) = k, we can construct an independent 
set of the same size. If any (1, i) is free in / while (z, 0) is zeroed in J, modify 
/ to set (l,i) to zero. Then the value of / is unchanged, but all pairs are 
either both free or both 0. This is the sort of aliasing pattern one gets from 
the previous reduction, and we can easily run the argument of the previous 
paragraph backwards to find an independent set in G of size k. 

Since there is an independent set of size k if and only if there are some /, J 
such that /(/, J) = k, and the reduction is clearly polynomial time, PARTIAL- 
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TENSOR-ONES is NP-hard. 

To show that PARTIAL-TENSOR-ONES is in NP, we must show a polynomial- 
sized certificate which can be checked in polynomial time. Given an instance of 
PARTIAL-TENSOR-ONES, a certificate can be a list of three symbols, L, R, 
or B, one for each constraint, indicating whether that constraint is satisfied by 
zeroing on the left, on the right, or in both. This is clearly polynomial in size 
of the input. To check the certificate, one only needs to check two conditions: 
first, that it is consistant, that is, that no pair of constraints on the same entry 
of the matrix constrain it to be both free and zero, which can be done with the 
square of the number of constraints such checks, and second that /(/, J) > k, 
which can be done by making a list of rows and columns with zeored entries, and 
for each of these the number of nonfree entries in that row or column. Then / 
can be computed from this easily. This takes time proportional to the number 
of constraints as well. So, the certificate can be verified in polynomial time. 
Therefore, PARTIAL-TENSOR-ONES is NP-complete. ■ 

Remark: We have not shown, in the reduction, a group (and appropriate 
subsets) which provides the appropriate ahasing. So, any polynomial time algo- 
rithm to find the best aliasing pattern from a given group and triple of subsets 
must either use more group theory, or show that P — NP. 

6 Conclusion 

We have shown that an analogue the algorithm described in can be applied to 
indexing sets that do not satisfy the triple product property, and provide some 
techniques for addressing the resulting optimization problems. In particular, we 
take sets satisfying the property and modify them in a small way to achieve a 
lower bound on lo. As the group-theoretic approach is known to tie the best- 
known upper bound, this suggests a possible path to improving upon the current 
record. 
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